20 research outputs found
Distributional composition using higher-order dependency vectors
This paper concerns how to apply compositional methods to vectors based on grammatical dependency relation vectors. We demonstrate the potential of a novel approach which uses higher-order grammatical dependency relations as features. We apply the approach to adjective-noun compounds with promising results in the prediction of the vectors for (held-out) observed phrases
Recommended from our members
Method51 for mining insight from social media datasets
We present Method51, a social media analysis software platform with a set of accompanying methodologies. We discuss a series of case studies illustrating the platform’s application, and motivating our methodological proposals
Learning to distinguish hypernyms and co-hyponyms
This work is concerned with distinguishing different semantic relations which exist between distributionally similar words. We compare a novel approach based on training a linear Support Vector Machine on pairs of feature vectors with state-of-the-art methods based on distributional similarity. We show that the new supervised approach does better even when there is minimal information about the target words in the training data, giving a 15% reduction in error rate over unsupervised approaches
Anti-social media
To inform the discussion over free speech and hate speech, this study examines the way racial, religious and ethnic slurs are employed on Twitter.
Executive summary: How to define the limits of free speech is a central debate in most modern democracies. This is particularly difficult in relation to hateful, abusive and racist speech. The pattern of hate speech is complex. But there is increasing focus on the volume and nature of hateful or racist speech taking place online; and new modes of communication mean it is easier than ever to find and capture this type of language.
How and whether to respond to certain types of language use without curbing freedom of expression in this online space is a significant question for policy makers, civil society groups, law enforcement agencies and others. This short study aims to inform these difficult decisions by examining specifically the way racial and ethnic slurs (henceforth, ‘slurs’) are used on the popular microblogging site, Twitter.
Slurs relate specifically to a set of words, terms, or nicknames which are used to refer to groups in a society in a derogatory, pejorative or insulting manner. Slurs can be used in a hateful way, but that is not always the case. Therefore, this research is not about hate speech per se, but about epistemology and linguistics: word use and meaning.
In this study, we aim to answer two following questions:
(a) In what ways are slurs being used on Twitter, and in what volume?
(b) What is the potential for automated machine learning techniques to accurately identify and classify slurs
Aligning packed dependency trees: a theory of composition for distributional semantics
We present a new framework for compositional distributional semantics in which the distributional contexts of lexemes are expressed in terms of anchored packed dependency trees. We show that these structures have the potential to capture the full sentential contexts of a lexeme and provide a uniform basis for the composition of distributional knowledge in a way that captures both mutual disambiguation and generalization
Improving Semantic Composition with Offset Inference
Count-based distributional semantic models suffer from sparsity due to
unobserved but plausible co-occurrences in any text collection. This problem is
amplified for models like Anchored Packed Trees (APTs), that take the
grammatical type of a co-occurrence into account. We therefore introduce a
novel form of distributional inference that exploits the rich type structure in
APTs and infers missing data by the same mechanism that is used for semantic
composition.Comment: to appear at ACL 2017 (short papers
Improving sparse word representations with distributional inference for semantic composition
Distributional models are derived from co- occurrences in a corpus, where only a small proportion of all possible plausible co-occurrences will be observed. This results in a very sparse vector space, requiring a mechanism for inferring missing knowledge. Most methods face this challenge in ways that render the resulting word representations uninterpretable, with the consequence that semantic composition becomes hard to model. In this paper we explore an alternative which involves explicitly inferring unobserved co-occurrences using the distributional neighbourhood. We show that distributional inference improves sparse word repre- sentations on several word similarity benchmarks and demonstrate that our model is competitive with the state-of-the-art for adjective- noun, noun-noun and verb-object compositions while being fully interpretable
Recommended from our members
Analysing trade-offs and synergies between SDGs for urban development, food security and poverty alleviation in rapidly changing peri-urban areas: a tool to support inclusive urban planning
Transitional peri-urban contexts are frontiers for sustainable development where land-use change involves negotiation and contestation between diverse interest groups. Multiple, complex trade-offs between outcomes emerge which have both negative and positive impacts on progress towards achieving Sustainable Development Goals (SDGs). These trade-offs are often overlooked in policy and planning processes which depend on top-down expert perspectives and rely on course grain aggregate data which does not reflect complex peri-urban dynamics or the rapid pace of change. Tools are required to address this gap, integrate data from diverse perspectives and inform more inclusive planning processes. In this paper, we draw on a reinterpretation of empirical data concerned with land-use change and multiple dimensions of food security from the city of Wuhan in China to illustrate some of the complex trade-offs between SDG goals that tend to be overlooked with current planning approaches. We then describe the development of an interactive web-based tool that implements deep learning methods for fine-grained land-use classification of high-resolution remote sensing imagery and integrates this with a flexible method for rapid trade-off analysis of land-use change scenarios. The development and potential use of the tool are illustrated using data from the Wuhan case study example. This tool has the potential to support participatory planning processes by providing a platform for multiple stakeholders to explore the implications of planning decisions and land-use policies. Used alongside other planning, engagement and ecosystem service mapping tools it can help to reveal invisible trade-offs and foreground the perspectives of diverse stakeholders. This is vital for building approaches which recognise how trade-offs between the achievement of SDGs can be influenced by development interventions
A critique of word similarity as a method for evaluating distributional semantic models
This paper aims to re-think the role of the word similarity task in distributional semantics research. We argue while it is a valuable tool, it should be used with care because it provides only an approximate measure of the quality of a distributional model. Word similarity evaluations assume there exists a single notion of similarity that is independent of a particular application. Further, the small size and low inter-annotator agreement of existing data sets makes it challenging to find significant differences between models
Disrupting Daesh: measuring takedown of online terrorist material and its impacts
This report seeks to contribute to public and policy debates on the value of social media disruption activity with respect to terrorist material. We look in particular at aggressive account and content takedown, with the aim of accurately measuring this activity and its impacts. Our findings challenge the notion that Twitter remains a conducive space for Islamic State (IS) accounts and communities to flourish, although IS continues to distribute propaganda through this channel. However, not all jihadists on Twitter are subject to the same high levels of disruption as IS, and we show that there is differential disruption taking place. IS’s and other jihadists’ online activity was never solely restricted to Twitter. Twitter is just one node in a wider jihadist social media ecology. We describe and discuss this, and supply some preliminary analysis of disruption trends in this area